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Abstract. We are interested in optimally driving a dynamical system that 
can be influenced by exogenous noises. This is generally called a Stochastic 
Optimal Control (SOC) problem and the Dynamic Programming (DP) prin- 
ciple is the natural way of solving it. Unfortunately, DP faces the so-called 
curse of dimensionality: the complexity of solving DP equations grows expo- 
nentially with the dimension of the information variable that is sufficient to 
take optimal decisions (the state variable). 

For a large class of SOC problems, which includes important practical prob- 
lems, we propose an original way of obtaining strategies to drive the system. 
The algorithm we introduce is based on Lagrangian relaxation, of which the 
application to decomposition is well-known in the deterministic framework. 
However, its application to such closed-loop problems is not straightforward 
and an additional statistical approximation concerning the dual process is 
needed. We give a convergence proof, that derives directly from classical re- 
sults concerning duality in optimization, and enlghten the error made by our 
approximation. Numerical results are also provided, on a large-scale SOC 
problem. This idea extends the original DADP algorithm that was presented 
by Barty, Carpentier, and Girardeau (2010). 



Introduction 

Consider a controlled dynamical system over a discrete and finite time horizon. 
This system may be influenced by exogenous noises that affect its behaviour. We 
suppose that, at every instant, the decision maker is able to observe these noises 
and to keep these observations in memory. Since it is generally profitable to take 
available observations into account when designing future decisions, we are looking 
for strategies rather than simple decisions. Such strategies (or policies) are feedback 
functions that map every instant and every possible history of the system to a 
decision to be made. 

More precisely, we are here interested in optimization problems with a large 
number of variables. The typical application we have in mind is the following. 
Consider a power producer that owns a certain number of power units. Each unit 
has its own local characteristics such as physical constraints that restrain the set of 
feasible decisions, and production costs that depend on the type of fuel that is used 
to produce power. The power producer has to control the power units so that a 
global power demand is met at every instant. The power demand, as well as other 
parameters such as inflows in water reservoirs or unit breakdowns, are random. 
Naturally, he is looking for strategies that make the production cost minimal, over 
a given time horizon. In such a problem, both the number of power units and the 
number of time steps are usually large. 
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One classical approach when dealing with stochastic dynamic optimization prob- 
lems is to discretize the random inputs of the problem using scenario trees. Such 
an approach has been widely studied within the Stochastic Programming commu- 
nity (see the book by Shapiro, Dentcheva, and Ruszczyhski, 2009, for an overview 
of this methodology). One of the advantages of such a technique is that as soon as 
the scenario tree is drawn, the derived problem can be treated by classical Mathe- 
matical Programming techniques. Thus, a number of decomposition methodologies 
have been proposed (Higle and Sen, 1996, Carpentier, Cohen, Culioli, and Renaud, 
1996, Ruszczyhski and Shapiro, 2003, Chapter 3) and even applied to energy plan- 
ning problems (Bacaud, Lemarechal, Renaud, and Sagastizabal, 2001). A general 
theoteric point of view concerning the way to combine the discretization of ex- 
pectation together with the discretization of information is given by Barty (2004). 
However, in a multi-stage setting, this methodology suffers from the drawbacks that 
arise with scenario trees. As it was pointed out by Shapiro (2006), the number of 
scenarios needed to achieve a given accuracy grows exponentially with the number 
of time steps of the problem. 

The other natural approach to solve SOC problems is to rely on the Dynamic 
Programming (DP) principle (see Bellman, 1957, Bertsekas, 2000). The core of 
the DP approach is the definition of a state variable that is, roughly speaking, the 
variable that, in conjunction with the time variable, is sufficient to take an opti- 
mal decision at every instant. It does not have the drawback of the scenario trees 
concerning the number of time steps since strategies are, in this context, depend- 
ing on a state variable whose space dimension usually does not grow with time 1 . 
However, DP suffers from another drawback which is the so-called curse of di- 
mensionality: the complexity of solving the DP equation grows exponentially with 
the state space dimension. Hence, brutally solving the DP equation is generally 
intractable when the state space dimension goes beyond several units. Recently, 
Vezolle, Vialle, and Warin (2009) were able to solve it on a 10-state- variables en- 
ergy management problem, using parallel computation coupled with adequate data 
distribution. 

Another popular idea is to represent the value functions (solutions of the DP 
equation) as a linear combination of a priori chosen basis functions (see among oth- 
ers Bellman and Dreyfus, 1959, Bertsekas and Tsitsiklis, 1996, Sect. 6.5). This ap- 
proach, called Approximate Dynamic Programming or often Least-Squares Monte- 
Carlo, has also become very popular in the context of American option pricing 
through the work of Longstaff and Schwartz (2001). This approximation reduces 
the complexity of solving the DP equation drastically. However, in order to be 
practically efficient, such an approach requires some a priori information about the 
problem, in order to define a well suited functional subspacc. Indeed, there is no 
systematic means to choose the basis functions and several choices have been pro- 
posed in the literature (de Farias and Van Roy, 2003, Tsitsiklis and Van Roy, 1996, 
Bouchard and Warin, 2010). 

When dealing with large-scale optimization problems, the decomposition/coordi- 
nation approach aims at finding a solution to the original problem by iteratively 
solving smaller-dimensional subproblems. In the deterministic case, several types 
of decomposition have been proposed (e.g. by prices or by quantities) and unified 
in a general framework using the Auxiliary Problem Principle by Cohen (1980a). 
In the open-loop stochastic case, i.e. when controls do not rely on any observation, 
Cohen and Culioli (1990) proposed to take advantage of both decomposition tech- 
niques and stochastic gradient algorithms. These techniques have been extended 
in the closed-loop stochastic case by Barty, Roy, and Strugarek (2009), but so far 
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they fail to provide decomposed state dependent strategies in the Markovian case. 
This is because a subproblem optimal strategy depends on the state of the whole 
system, not only on the local state. In other words, decomposition approaches are 
meant to decompose the control space, namely the range of the strategy, but the 
numerical complexity of the problems we consider here also arises because of the 
dimensionality of the state space, that is to say the domain of the strategy. 

We here propose a way to use price decomposition within the closed-loop stochas- 
tic case. The coupling constraints, namely the constraints preventing the problem 
from being naturally decomposed, are dualized using a Lagrange multiplier (price). 
At each iteration, the price decomposition algorithm solves each subproblem using 
the current price, then uses the solutions to update the price. In the stochastic con- 
text, price is a random process whose dynamics is not available, so the subproblems 
do not in general fall into the Markovian setting. However, in a specific instance 
of this problem, Strugarek (2006) exhibited a dynamics for the optimal multiplier, 
and he showed that these dynamics were independent with respect to the decision 
variables. Hence it was possible to come down to the Markovian framework and 
to use DP to solve the subproblems in this case. Following this idea, Barty et al. 
(2010) proposed to choose a parametrized dynamics for these multipliers in such 
a way that solving subproblems using DP becomes possible. While the approach, 
called Dual Approximate Dynamic Programming (DADP), showed promising re- 
sults on numerical examples, it suffers from the fact that the induced restrained 
dual space is non-convex. This led to some numerical instabilities and, probably 
more important, it was not possible to give convergence results for the algorithm. 
We here propose to extend DADP in a more general way that allows us to derive 
convergence results and solves the problem of numerical instabilities. 

The paper is organized as follows. In Section 1, we present the general SOC 
problem and the DP principle. Then we concentrate on a more specific class of 
problems, that we call decomposable problems, and recall the previous version of 
the DADP algorithm. In Section 2, we present the new version we propose and give 
convergence results for the algorithm. Finally, in Section 3, we apply DADP to two 
numerical examples, the first being the one from the previous paper by Barty et al. 
(2010) and the second one being a more realistic power management example. 

1. Mathematical formulation 

1.1. General problem setting. All along the paper, random variables are de- 
noted using bold letters. Consider a discrete and finite time horizon 0, 1, . . . , T 
and a probability space (Q,A, P). To define a stochastic dynamical system, we 
need: 

• a stock process X = (Xq, . . . , Xt) which represents the physical states of 
the system through time, the value of X t lying, at every instant t, in a 
Hilbert space X t ; 

• a control process U = {Uo, . . . , Ut-i), the value of U \ lying, at every 
instant t, in a Hilbert space Ut; 

• a noise process XV = (XV a, . . . , Wt-i), the value of XV t lying, at every 
instant t, in a Hilbert space Wj. 

The spaces X t , Ut and Wt are generally finite-dimensional spaces. In the sequel, 
we suppose X t = R™ and Ut = R m . The decision variable Ut being a random 
variable, and our purpose being to use variational techniques that require the no- 
tion of gradient, it is natural to suppose that Ut lies in a Hilbert space Lit, for 
example L 2 (Q, A, P; Ut). 

The three types of variables are linked together in the following way. At every 
time step t, there exists a function f t (the dynamics of the system) that maps the 
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triplet (X t ,Ut,Wt) to the next stock value X t+ i- Let (Ao, ■ ■ ■ , At-i) be the 
filtration associated with the stochastic process W. We suppose that, at every 
time step t, the decision maker is able to observe and to keep in memory all the 
past history of W up to time t. The causality principle states that the decision U t 
at time t is ^-measurable, i.e. only depends on past observations. Moreover, at 
each time step t, a cost Ct(X t , Ut, Wt) is incurred. Finally, at the final time T, a 
cost K{Xt) is added. The Stochastic Optimal Control (SOC) problem we would 
like to solve hence reads: 

(la) min E ^ C t (X t , U u W t ) + K {X T )^ , 

subject to dynamics constraints: 

(lb) X t+1 =f t (X t ,U t ,W t ), Vfc = 0,...,T-l, 

(lc) Xq is given, 

as well as bound constraints: 

(Id) x t <X t <x t , Vt=l,...,T, 

(le) u t < u t<u t , Vi = 0,...,T-l, 

static constraints: 

(If) g t (X t ,U u W t ) = 0, Vfc = 0,...,T-l, 

and the non-anticipativity constraint: 
(lg) Ut is ^-measurable. 

Constraints (lb), (Id), (le) and (If) have to be understood in the P-almost sure 
sense. We give examples for constraint (If) in §2. With no further assumptions, 
Problem (1) cannot generally be solved analytically, except for quite particular 
cases among which is, for instance, the Linear Quadratic Gaussian (LQG) case. 
One has to be aware that, when solving this problem, one is looking for functions 
that map every possible history of the system to a decision; the domain of such 
a function is clearly growing with time and representing it on a computer rapidly 
becomes intractable. 

1.2. The Dynamic Programming Principle. Fortunately enough, control the- 
ory helps us reduce the size of the optimal strategy's domain in some cases. Let us 
first make the following assumption. 

Assumption 1. Noises Wo, . . . , WV-i are independent over time. 
Now define functions Vt, for every time step t = 0, . . . , T, as: 

(T-l 
Y^C S {X S ,U S ,W S )+K{X T ) 
— * 

u t , ■ ■ ■ , u T-l 

subject to the same 2 constraints as in Problem (1). Function Vt represents the 
minimal remaining cost of the problem when starting at time t, for every possible 
stock value x. 

Under Assumption 1, the Dynamic Programming (DP) principle states that the 
variable X t , along with the current noise value Wt, contains all the information 
that is sufficient to take the optimal decision at time t, hence the term state variable. 



2 while starting at time t 



PRICE DECOMPOSITION IN LARGE-SCALE STOCHASTIC OPTIMAL CONTROL 5 



Moreover, it provides a way to compute functions Vt, that we now call Bellman 
functions (or value functions), as well as optimal strategy, in a backward manner. 

(2a) V T (x)=K{x), VxeX T , 
and, for every time step t = T — 1, . . . , 0: 

(2b) V t (x)=E(mmC t (x,u,W t ) + Vt + i(ft(x,u,Wt))), g X t . 

Compared with the original setting where the optimal strategy domain was growing 
along with time steps, the DP principle drastically reduces the size of the informa- 
tion needed to make an optimal decision. 

Remark 1 (About the overtime independence) . In the case when the model is such 
that noises that affect the system have some sort of correlation through time, one 
can always explicit the dynamics of the noise variable and add it to the dynamics 
of X t , thus defining a new (albeit larger!) state variable as well as a new noise 
variable that is now independent over time. 

Remark 2 (Hazard-Decision setting). The reader may have noticed that the way 
the non-anticipativity constraint in written allows the decision maker at time t to 
observe the current noise value Wt before choosing the control Ut- In such a setting 
the optimal decision at time t depends on both the state variable X t and the noise 
variable Wt whereas the value function only depends on the state variable X t - 

Note however that the dimension of the state space X t might still be quite large. 
Yet the complexity of solving the DP equation (2) grows exponentially with the 
dimension of X t ; this unpleasant feature is well known as the curse of dimensionality 
and prevents us from solving this equation by discretization when the state space 
dimension is, say, greater than 5. 

1.3. Decomposable problem setting. Let us now present a particular instance 
of Problem (1) on which we are able to reduce even more the size of the information 
needed to take a reasonable decision. 

We consider a system which consists of N subsystems 3 , whose dynamics and 
cost functions are independent one from another. More precisely, the state X t (re- 
spectively the control Ut) of the global system writes (X t , . . . ,X t ) with X\ £ 
L 2 (fi,^,P;R n ') (resp. (U\,...,Uf) with U\ G L 2 (0, A,V;WL m *)) and n = 

n i ( res P- 171 = Ej=i m i)i so that the global dynamics X t +i = ft (X t , U t, Wt) 
can be written independently unit by unit: X\ +l = f\ [X\, U\, Wt), i = 1, . . . , N. 
In the same way, the global cost Gt (Xt, Ut, Wt) is equal to the sum of the local 
unit costs C| (X\, U\, Wt) , i = 1, ■ ■ • , N. At the end of the time period, each unit i 
causes a cost K l that only depends on its final state X l T . 

Remark that, without further constraints, the induced SOC problem can be 
stated independently unit by unit, though the same noise variable affects all units 
(see Appendix B for a precise proof). Hence, under Assumption 1, the solving of the 
DP equation can be decomposed unit by unit. For each unit, the optimal strategy 
depends only on its local state 4 , which is usually far smaller than the dimension of 
the global state space. 

Consider now a static constraint (If) that couples the units together. We suppose 
that such a coupling arises from a set of static Revalued constraints, the constraint 
at time step t reading Yli=i 9t ip^-ti Ut, W t ) = 0. This kind of coupling constraint is 
natural in many industrial applications, including the case of a power management 
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problem that we already mentioned in the introduction: the sum of the productions 
of the power units must meet an uncertain power demand. 

The decomposable problem we are interested in solving in the following reads: 

(T-l n n \ 

£ J2 Q u\, w t ) + E Ri ( x t) 
t=0 i=l i=l / 

subject to dynamics constraints: 

(3b) X* +1 = f*(X i t ,U i tt W t ), Vi = 0,...,T-l,Vi = l,...,7V, 

(3c) X l Q is given, Vi=l,...,N, 

as well as bound constraints: 

(3d) xl<Xl<xl, Vt=l,...,T,Vi = l,...,N, 

(3e) ni<Ul<ul \/t = 0,...,T-l,Vi = l,...,N, 

static constraints: 

N 

(3f) Y.al{X\,U\,W t ) = 0, Vi = 0,...,T-l, 

i=l 

and the non-anticipativity constraint: 

(3g) U\ is ^-measurable, Vi = 0, . . . ,T - 1, Vi = 1, . . . , N. 

There are three types of coupling in Problem (3): 

• The first comes from the state dynamics (3b) that induce a temporal cou- 
pling. 

• The second one arises from the static constraints (3f) that induce a spatial 
coupling: they link together all the subsystems at each time step t. 

• The third type of coupling is informational: it comes from the causality 
constraint (3g), which prevents us from decomposing directly scenario by 
scenario : if two realizations of the noise process are identical up to time t, 
then the same control has to be applied at time t on both realizations. 

Constraints (3f) prevent us from decomposing the optimization problem unit 
by unit: the solution U\ for unit i and time t has to be searched as a feedback 
function tp\ depending on the current noise value and on the whole stock vari- 
able X t = {X\, . . . , ) rather than on the local stock variable X\\ Adding the 
coupling constraint (3f) drastically changed the structure of the problem. 

Remark 3 (Local and global noises) . Applications we have in mind are power man- 
agement problems which are completely "flower-shaped", in the following sense. 
The noise variable Wt at time t is composed of two different kinds of noise: 

• a local noise W\ for every subsystem i, i.e. at every petal of the flower (un- 
certain inflows entering a water reservoir, for instance); 

• a global noise D t at the center of the flower (a total power demand, for 
instance) . 

In such a setting, only the local noise appears in the cost function and in the 
dynamics, leading to functions of the form: 
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while the global noise appears only in the coupling constraint as, for instance: 

jv 

5>j (X\,U\)=D t . 

i=l 

Keeping this particular case in mind shall give us some insight about how to de- 
compose the global problem as well as possible. This is explained in more details 
in §2.1 and such settings are treated in the numerical experiments of §3. 

1.4. Previous paper. In a previous study (Barty et al., 2010), the authors pro- 
posed a way of handling Problem (3) by approximate Lagrangian decomposition. 
The proposed algorithm, called Dual Approximate Dynamic Programming (DADP) 
is as follows. Let us introduce the Lagrangian of Problem (3): 

"T-l N 



C (X, U, A) := E MT £ ( C l ( X l U\,W t ) + Xj g\ (X*, U*,W t 

\ t=0 i=l 

N \ 

, 

with At <E L 2 (il, A, P; K d ) the Lagrange multiplier of the coupling constraint (3f) 
and A := (Ao, . . . , Ay-i). Note that, since the dualized constraint is ^-measurable, 
the Lagrange multiplier A t need only to have the same measurability. 
Problem (3) is always equivalent to: 

min max C (X, U, A) , 
x,u A 

where the minimization is subject to all constraints of Problem (3) except con- 
straint (3f). If C has a saddle point (see Appendix A for a definition and a char- 
acterization of saddle points), then this problem is equivalent to the so-called dual 
problem: 

(4) max min C (X , U , A) , 

a x.u 

under, once again, the same constraints as in Problem (3) except the coupling 
constraint (3f). 

The key point of the so-called price decomposition algorithm is that the inner 
minimization problem can be split into TV subproblems, each one involving a single 
subsystem (once again, see Appendix B for more details). One might think that 
solving these subproblems is much simpler than solving the original global prob- 
lem. This is not the case here: because the dual variable A is a stochastic process 
that depends in general on the whole history of the system, we cannot reasonably 
make the overtime independence assumption that leads to the DP principle and 
subproblems are just as hard as Problem (1)! 

The idea of Barty et al. (2010) is to force the dual process to satisfy a prescribed 
dynamics: 

(5a) A = h ao (W ) , 

(5b) A t+ i = h at+1 (A t> W t +i) , W = 0,...,T-2, 

where h at is an a priori chosen function parametrized by at € K 9 . We note a = 
(ao, . . . ,a.T-i)' Given a vector a k of coefficients at iteration k of the algorithm 
which defines the current values of the dual variables, the first step of DADP is 
to solve the TV subproblems by DP with state (X\,\ t ). In order to update the 
Lagrange multipliers, the authors propose to draw S trajectory samples of the 
noise W and integrate the dynamics (3b)-(3c) and (5) using the optimal feedback 
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laws obtained at the first step, thus obtaining S sample trajectories of X , U 



and A . A gradient step is then performed sample by sample: 

N 

:,s , n w 



X k+i,S = +ptX J2gi ( X f '.Uf \ W' t ), V* = 1, . 



s. 



1=1 



with p t obeying the rules of the step-size choice in Uzawa's algorithm (see Appen- 
dix A). Finally, we solve the following regression problem: 



8=1 



nin ^(WgJ-Ai 

..,a T -l ' — ' \ II 



T-2 



The last minimization produces coefficients a k+1 which define, using Equation (5), 
fc-i-i 

a new process A . 

This procedure has several advantages, notably that its complexity is linear 
with respect to the number N of subproblems and that it may lead, depending 
on the choice for the dual dynamics h, to tractable approximations of the original 
problem. The authors illustrate this fact on a small example on which they are 
able to compare standard DP and DADP. 

Still, it has some drawbacks, mainly theoretical. First of all, the shape of the 
dynamics introduced for the dual process is arbitrarily and once for all chosen and 
the quality of the result depends on this choice. Moreover, this dynamics defines 
a subspace which is non-convex. The next iterate A fc+1 being a projection on this 
subspace, it is not well defined and some oscillations observed in practice may be 
due to this fact. Finally, this non-convexity prevents us from obtaining convergence 
results for this algorithm. 



2. Dual Approximate Dynamic Programming revisited 

We now propose a new version of the DADP algorithm and show how it over- 
comes the above mentioned drawbacks encountered with the original algorithm. In 
this new approach, we do not suppose a given dynamics for the multipliers anymore. 
Still, we use the standard price decomposition algorithm and perform the update 
of the multipliers scenario-wise using the classical gradient step: 

N 

X k +hs = X k, s + PtX j2 g \ (xr \uT\w:) , Vs = 1, . . . , s. 

i=l 

The difficulty is now to solve the subproblems, as explained in §2.1. 

2.1. Projection of the dual process. After Lagrangian decomposition of Prob- 
lem (3) with a given multiplier A, the i-th subproblcm reads: 

(6a) min E ^ (c\ (X\, U\, W t ) + Xj g\ (X\, U*, W t )) + JO {X^)\ 
subject to dynamic constraints: 



(6b) 
(6c) 



Xl +1 = f} (Xl,Ul,W t ) , Vt = 0,...,T-l, 
Xq is given, 
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as well as bound constraints: 
(6d) 7? t <X\<x\, Vt=l,...,T, 

(6e) 4<Ui<v? t , Vt = 0,...,T-l, 

and the non-anticipativity constraint: 

(6f) U\ is .A t -measurablc. 

As it was already mentioned, since the dual stochastic process A generally de- 
pends on the whole history of the process, solving this problem is in general as 
complex as solving the original problem. In order to bypass this difficulty, let us 
choose at each time step t a random variable Y\ that is measurable with respect 
to At- We call Y % = (Y l Q , . . . ,Y l T _ 1 ) the information process for subsystem i. 
The idea is to rely on a short memory process Y % . Note that we require that this 
random process is not influenced by controls. We propose to replace Problem (6) 
by: 
(7) 

g% E (E {°t W U\, W t ) +E (A t | Yl) T gi (X\, U\, W t )) + K l (X^ , 

subject to constraints (6b)-(6f). 

Let us first examine the special situation in which the information variable Y\ 
only depends on the current noise Wt- The process Y 1 does not add memory in 
the system so that Problem (7) can be solved using the standard DP equation: 

Vt (x) = K l (x) , Vx e Xj., 

Vi (x)=e( min C\ (x, u, W t ) +E (X t I Y' l t ) T g\ (x, u, W t ) 

+ V* +1 (f}(x,u,Wt))), WxeXl. 

The expectation quadrature only involves the noise variable Wt- Remember, as 
explained in Remark 2, that we are in the "hazard-decision" setting: even though 
the control at each instant t depends on both X\ and Wt, the Bellman function 
only depends on X\. 

Because of the overtime independence of the information variables Y\ , we have 
to solve DP equations whose dimension is the subsystem dimension n,. Let us give 
three examples of choices for Y\. 

Example 1 (Maximal information). One can choose to include in Y\ all the noise 
at time t. As already explained in Remark 3, the cost function and dynamics of 
a subsystem may only depend on a part of the whole noise Wt (a kind of local 
information denoted by W\ in Remark 3). Yet some global noise, denoted by D t 
in Remark 3 may appear in the coupling constraint (e.g. a global power demand). 
Hence this maximal choice for the information variable makes the multiplier depend 
on both local and global information: this shall improve the subsystem's vision of 
the rest of the system and hence improves strategies. Note, however, that includ- 
ing all the noise at time t in the information variable is only possible in practice 
when the noise dimension is not too large. Indeed, the information variable ap- 
pears in a conditional expectation, whose computation is subject to the curse of 
dimensionality. 

Example 2 (Minimal information). On the opposite, one can choose Y\ = or any 
other constant. The dual stochastic process is then approximated by its expecta- 
tion at every instant. Compared to the previous example, there is no conditional 
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Scenario-wise coordination 

N 

X k+1 = X k t +px^2gi[x i t ' k ,U\- k ,W 

i=l 




Compute E(A t | Y\) 
and solve subproblcm 



Compute E(A t | Y\) 
and solve subproblcm 



Compute E(A t | Y? ) 
and solve subproblcm 



Figure 1 . Dual Approximate Dynamic Programming 



expectation anymore but one obtains a strategy that corresponds to the vision of 
an average price. 

Example 3 (In between). One can choose Y\ of the form h\{Wt)- In practice, 
this choice will be guided by the intuition one has on which information mostly 
"explains" the optimal price of the system. One has to make a compromise between 
sufficient information to take reasonable actions and a not too large information 
variable to be able to compute the conditional expectation in (7). 

Let us move towards the general case where one can choose to keep some infor- 
mation in memory. In other words, one can choose an information variable that has 
a Markovian dynamics, i.e. of the form Y\ +1 = h\{Y\,W t+i) ■ In order to derive 
a DP equation in this case, one has to augment the state vector by embedding Y\, 
that is the necessary memory to compute the next information variable. Thus, 
the Bellman function associated with the i-th subproblem depends, at time t, on 
both X\ and Y\_ 1 . The DP equation writes: 

V; (x, y) = E ( nun C\ (x, u, W t ) + E (\J | y*) • g\ (x, u, W t ) 



V; +1 (fi(x,u,W t ),Yi 



with Y\ = h\_ r {y,W t ). 



When solving this equation, one obtains controls as feedback functions on the local 
stock X\, the current noise Wt and the information variable Y\_ 1 of the previous 
time step. The index gap between information and stock variables comes from the 
"hazard-decision" setting: at time t, the information that is used to take decisions 
is the conjunction of the information kept in memory (that has index t — 1) and of 
the noise observed at the current time step Wt- The sketch of the DADP algorithm 
is depicted in Figure 1. 

Example 4 (Perfect memory). The choice Y\ = (Wo, ■ ■ ■ , Wt) stands in the Mar- 
kovian case. We have then E (A t | Y]) = \ t . This choice hence allows us to model 
the dual variable perfectly, but the induced DP equation is unsolvable in practice. 
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Example 5 (Strugarek, 2006). In his PhD thesis, Strugarek exhibited a case when an 
exact model for the dual process can be obtained. His example is inspired from the 
kind of power management problem mentioned in the introduction, where N water 
reservoirs have to contribute to a global power demand, the rest of this demand 
being produced by fossil fuel. The noise at each time step t is composed of a scalar 
inflow A\ for each reservoir i = 1, . . . , N, and of a scalar power demand D t . The 
problem reads: 

(T-l n (U^\ 2 
t=l 3 = 1 

where Cj, j = 1, . . . , TV and jj, j = 1, . . . , N are given real values, subject to 
dynamic constraints on reservoirs: 

(8b) Xi +1 =Xi + A( +1 -Ui, Vi=l,...,T-l,Vj = l,...,n, 

the power demand constraint: 

n 

(8c) J2ui=D t , Vt = l,...,T-l, 

j'=i 

and the non-anticipativity constraint: 

(8d) U t is a {D s , s < t ; A s , s < t} -measurable. 

Let us denote A° t := Y^%=i A\. The author then shows the following result. 

Proposition 1 (Strugarek, 2006, Chapter V). If random variables {Dt, At)t=i,...,T 
are independent over time, and if there exists a > such that jj = acj, for all j = 
1, . . . , n, then the optimal multiplier A associated with the coupling constraints (8c) 
satisfies the following dynamics: 

/ T T-L \ 

Al= Di(l-a)-Q^E(A:)- a ^E(J) s ) , 

^-'J=l cj \ s= 2 s=2 ) 



At+i — At 



En J_ 



D t+ i(l + a)- A-aE(A+i) 

-a{A° t+1 -K{Al +1 )) 



V* = l,...,T-2. 



This allows the solving of subproblcms using DP in dimension 3. Note that this 
example enters our approach if one chooses {Y t ,Dt) as an information variable, 
with: 



T-l 



Y 1 = —J-^iD 1 (l-a)-aJ2E(A^)-aJ2E(D s )\ , 

2—ti=\ a \ c = 2 .— 2 / 



and, for all t = 1, ...,T-2 
1 



Yt+i =Y t + 



En 
i=l 



A+i (1 + a) - D t - aE (A+i) - a (A° +1 - E 



We get back to the particular case when E(A t | Y\) — At, with a small dimensional 
information variable Y\. Note however that conditions of Proposition 1, especially 
the proportionality relation on costs, make little sense in practice. 
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2.2. Convergence. We now give convergence results about DADP and explain in 
more details the relation between the strategies it builds and the solution of the 
original problem (1). To make the paper self-contained, we recall in Appendix A 
the general results concerning duality in optimization, of which the properties of 
DADP are direct consequences. 

The approximation made on the dual process gives us a tractable way of com- 
puting strategies for each one of the subsystems. Depending on the choice we make 
for the information variable, it is quite clear that some strategies will lead to better 
results than others, concerning the value of the dual problem or the satisfaction of 
the coupling constraint. Let us here state more precisely these facts. 

From now on, we consider a unique information variable for all subsystems. We 
denote it by Y t and define Hilbert spaces 

y t ■= {A t e L 2 {Q,A,F) : A t is >V measurable}, 

for every t = 0, . . . , T — 1. 

Proposition 2. Consider the following optimization problem: 

(T-l N N \ 

]T ]T c\ (xi, ui, w t ) + £ k 1 (x*,) , 
t=0 i=l i=l / 

subject to the same constraints as in Problem (3) except the coupling constraint (3g) 
which is replaced by: 



N 



(9b) e [Y t g\{x\,u\,w t 



Yt\=0, Vfc = 0,...,T, 



Suppose the Lagrangian associated with Problem (9) has a saddle point. Then 
DADP solves Problem (9). 

Proof. The DADP algorithm consists in: 

• given a price process, solving subproblems using the projection of this price 
process on y x • • • x 3^r-i; 

• updating the price process using a gradient formula. 

Alternatively, one may consider that the gradient formula is composed with the 
projection operation in the updating formula. Therefore, this algorithm may also 
be viewed as a projected gradient algorithm which exactly solves the following 
max-min problem : 



(10a) 

/ T N 

max mm E £ £ (c\ (X\ , U\, W t ) + Xjg* (X\ ,U l t ,W 



x,u 
(10b) 



v t=0 i=l 




s.t. X\ +1 = fl(X\,U\,W t ), Vt = 0,...,T-l,Vi = l,...,N, 
(10c) X = Wo, 

(lOd) £ l t <Xj<xj, Vi = l,...,T,Vi = l,...,iV, 
(lOe) ui<Ul<ul, Vt = 0,... J T-l,Vi = l,... J JV J 
(lOf) U t is A-mcasurable, Vf = 0, . . . , T, 
(lOg) At is Ft-mcasurable, W = 0, . . . , T. 

Observe that the max operation is restricted to a linear subspace defined by (lOg). 

Now, if within the inner product (a, 6) = E (a T 6), the variable a belongs to 
a given subspace, then the component of b which is orthogonal to that subspace 
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yields in the inner product. Hence it is useless. Put in our context, the multi- 
plier A t can only control the part of g\ (X\, U], Wt) which has the same measur- 
ability as A t . Thus, assuming the existence of a saddle point, that is, the max and 
min operations can be interchanged in Problem (10), this problem appears as the 
dual counterpart of Problem (9). □ 

Loosely speaking, DADP somehow consists in replacing an almost-sure con- 
straint by a constraint involving a conditional expectation with respect to a so-called 
information variable. So it is once again clear that if we choose the information 
variable Y t to be the whole history of the system, then wc come back to the initial 
constraint and we in fact solve the original problem. This is the case of Exam- 
ple 4. On the contrary, putting no information at all in Y t is the same as satisfying 
the coupling constraint only in expectation. This is the case of Example 2. Note 
however that it is generally a poor way of representing an almost-sure constraint. 

The main difficulty is to find the information variable Y t that is going to satisfy 
the coupling constraint in a fairly good way while keeping the solving process of 
the subproblems tractable. 

We now state the convergence of the DADP algorithm. Let us introduce the 
objective function J : Uq x • • • x Ut-i — »• K associated with strategy U, i.e.: 

/T-l N N \ 

J:U* E (j2J2 C t( X t> U t> W t)+Y, Kl ( X T) > 

\t=0 i=l i=l / 

with: Xq = Wq, 

and: X\ +1 = f\ (X\,U\, W t ) , Vt = 0, . . . , T - 1, Mi = 1, . . . , N. 

Proposition 3. //: 

(1) J is convex, lower semi- continuous, Gateaux differentiable, 

(2) J is a-strongly convex, 

(3) all g\ are linear and c-Lipschitz continuous, 

(4) the Lagrangian associated with Problem (9) has a saddle point (£/, A), 

(5) the step-size p of the algorithm is such that < p < 2^-, 
Then: 

(1) there exists a unique solution U of Problem (9), 

(2) DADP converges in the sense that : 

U k — > U inU a x ■■■ xUt-i, 

k— > + oo 

(3) the sequence (\ k )k>o is bounded and every cluster point A in the weak 
topology is such that (U , A) is a saddle point of the Lagrangian associated 
with Problem (9). 

Proof. The convergence of the algorithm is then a direct application of Theorem 1, 
Appendix A. □ 

Note that assumptions of Proposition 3 plus the qualification of constraint (9b) 
ensure that the Lagrangian associated with Problem (9) has a saddle point. 

3. Numerical experiment 

We now show the efficiency of DADP on two numerical examples. The first one 
comes from a previous paper (Barty et al., 2010) in which the authors developped 
a preliminary version of DADP (see §1.4). We show in §3.2 the good performance 
of the new version of DADP. The second one, in §3.3, is an application to a more 
realistic power management problem. 
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3.1. Computing conditional expectations. Within the DADP procedure, at 
each iteration, we have to compute conditional expectations in the criteria (7) of 
the subproblems. In order to compute these conditional expectations, we used Gen- 
eralized Additive Models (GAMs), that were introduced by Hastie and Tibshirani 
(1990). The estimate takes the form: 



E(Z | P 1 ,...,P n )~5^/ i (P i ). 

i=i 



Functions fi are splines (piecewise polynoms) whose characteristics are optimized 
by cross-validation on the input statistical data. Our purpose here is not to explain 
in details this methodology. The interested reader will find further explanations 
about this model and its implementation in the book by Wood (2006). We used 
an easy-to-use implementation that is available within the free statistical software 
R (R Development Core Team, 2009). The GAM toolkit, called mgcv, also returns 
useful indicators concerning the quality of the estimation. In particular, we use 
the deviance indicator, which takes value if Z is estimated as poorly as by its 
expectation E (Z) and value 1 if the estimate is exact, i.e. if X)™=i /«Ct) = 

Remark 4 (Kernel estimator). We chose to use GAMs to compute conditional ex- 
pectations after a numerical comparison with the more classical kernel regression 
methods (Nadaraya, 1964, Watson, 1964) also available in the R environment. Even 
though both of them gave similar results, GAMs appeared to be several times faster 
than the kernel method on our problem. 

3.2. Back to an example from a previous paper. We first implement the new 
version of DADP algorithm on a simple power management problem introduced by 
Barty et al. (2010). On this small-scale example, we are able to compare DADP 
results to those obtained by DP and to illustrate the theoretical results described 
above. Let us first recall this example. Consider a power producer who owns two 
types of power plants: 

• Two hydraulic plants that are characterized at each time step t by their 
water stock X\ and power production U\, and receive water inflows AJ +1 , 
i = 1,2. Such units are usually cost-free. We however impose small qua- 
dratic costs on the hydraulic power productions in order to ensure strong 
convexity. 

• One thermal unit with a production cost that is quadratic with respect to 
its production U^. There are no dynamics associated with this unit. 

Using these plants, the power producer must supply a power demand D t at each 
time step t, over a discrete time horizon of T = 25 time steps. All noises, i.e. 
demand Dt and inflows A] and A\ are supposed to be overtime independent noise 
processes. The interested reader may find more details on this numerical experiment 
in the previous paper by Barty et al. (2010). 
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The problem reads 
(11a) 




(lib) 
(11c) 
(lid) 

(lie) 0<Ul<u\ Vi = l,2, Vt = 0,...,T-l, 

(llf) 0<[/|, Vt = 0,...,T-l, 

(llg) f/j is a{D Q , A\, Al, . . . , D t , Aj,^} -measurable, V« = l,2,3. 

In this problem, the state X t is two-dimensional, hence DP remains numerically 
tractable and we can use the DP solution as a reference. In order to use DADP, we 
choose an information variable Y t at time t that is equal to the power demand D t . 
This comes from the insight that the power demand is a "global" information and 
has all reasons to be useful to the sub-problems. 

Remark 5 (Primal feasibility). In order to validate the method, it has to be evalu- 
ated within a simulation procedure. For the evaluation to be fair, the strategy must 
be feasible. Yet, as explained in §2.2, DADP does not ensure that the coupling con- 
straint (3f) is satisfied. To circumvent this difficulty, the thermal unit strategy is 
chosen in the simulation process so as to ensure feasibility of the coupling constraint, 
i.e.: 

(12) U\=D t - (U\ + U 2 t ) . 

That is, DADP returns three strategics, for each of the hydraulic units and for 
the thermal unit. However, we use relation (12) for the thermal strategy during 
simulations in order to ensure demand satisfaction and give an estimation of the 
cost of the DADP strategy. 

We run the algorithm for 20 iterations and depict its behaviour in Figure 2. We 
draw the dual cost (evaluation of the dual function with the current strategy) and 
the primal cost (the one with all constraints satisfied) at each iteration. Each point 
of the primal and dual curves is computed by Monte Carlo simulation over 500 
scenarios. We observe the regular increase of the dual function, as expected, and 
the decrease of the primal function. The distance between the primal and dual 
costs is an upper bound for the distance to the optimal value that graphically, in 
this case, seems quite tight. 

Moreover, the GAM toolkit used to compute the conditional expectations of the 
form E (At | D t ) returns that the deviance, i.e. the quality of the explanation of At 
by D t is 98.5%. This indicates that the marginal cost of the system is almost 
perfectly explained by the time variable and the power demand. Otherwise stated, 
using E (A t | D t ) instead of using A t within Problem (11) does not alter too much 
the quality of the solution. 

3.3. A larger-scale SOC problem. We now apply DADP on a real-life power 
management problem, inspired by a case encountered at EDF, which is the major 
European power producer. We do not give the exact order of magnitude for costs 
and productions because of confidentiality issues. We consider : 

• a power demand on a single node (we neglect network issues) at each instant 
of a finite time horizon of 163 weeks (one time step per week); 

• 7 (hydraulic) stocks which are in fact aggregations of many smaller stocks; 



16 



K. BARTY, P. CARPENTIER, G. COHEN, AND P. GIRARDEAU 



1,700 



primal cost 

dual cost 
optimal cost 



1,600 - 



1,500 



1,400 



1,300 



1,200 



2 4 6 8 10 12 14 16 18 20 



Figure 2. Primal, dual and optimal costs with respect to the 
number ol iterations 



• 122 other (thermal) power units with no stock constraints. 

All the thermal power units are aggregated so that the thermal cost C* at each 
time t only depends on the total thermal production Uf 1 and forms a quadratic 
cost. We note Ct using bold letters, which means that this thermal cost is random, 
because of the breakdowns that may happen on thermal power plants. 
The problem reads: 



subject to hydraulic stock dynamics : 



(13b) Xl = x l , Vi=l,...,7, 

(13c) X\ +1 = X\ - U\ + A\, Vi = l,...,7,Vt = 0,...,T-l, 



iterations 



(13a) 




power demand constraints : 



7 

(i3d) J2 u t + u t h = D t> vt = o,... 1 r-i, 

i=l 
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bound constraints on stocks and controls : 



(13e) 
(13f) 
(13g) 



St < Uf < u, 
Ml<Ul< u\, 

2±t X\ < x\, 




Vt = 0,...,T-l, 
Vi = l,...,7,Vi = 0,...,T-1 
V» = l,...,7,Vt = 0,...,T, 



and non-anticipativity constraints : 

(13h) U\ is (W 0J -measurable, Vi = 1, . . . , 7, Vi = 0, . . . ,T - 1, 

(13i) C/f is (W , W t ) -measurable, Vt = 0, . . . , T — 1, 

with Wt '■= {A t , Ct, Dt) being the set of all noises that affect the system at time t. 

Because we consider 7 stocks, we are unable to use DP directly on this problem. 
In order to obtain a reference point, we use an aggregation method introduced by 
Turgeon (1980) and currently in use at EDF. This numerical method is known to be 
especially well-suited for the problem under consideration. It consists in solving N 
subproblcms (7 in our case) by 2-dimcnsional DP, each subproblcm relying on a 
particular power unit, instead of one A-dimensional DP problem. The idea is, for 
every unit, to look for strategies that depend on the stock of the unit and on an 
aggregation of the remaining stocks. 

We then make use of DADP using three different choices for the information 
variable Y t . 

• In the first setting, we replace the price at each time step by its expectation. 
In other words, we explain the price only by the time variable t. According 
to Proposition 3, we are in fact solving Problem (13) with constraint (13d) 
replaced by its expectation. Then we are able to solve each subproblem i 
by DP in dimension 1 (the stock variable of unit i) and we obtain strategies 
that depend, for each unit i and each instant t, on the stock X\ and the 
inflow A\. 

• In the second setting, we replace the price at each time step by its con- 
ditional expectation with respect to the power demand. Put differently, 
we explain the price by time and demand. We still have to solve a 1- 
dimcnsional DP equation and we obtain for each instant t a strategy that 
depends on X\, A\ and D t . 

• In the third setting, we replace the price at each instant by its conditional 
expectation with respect to the power demand and the thermal availabil- 
ity 5 Pt- We then obtain a strategy that depends, for every unit i and every 
instant t, on X\, A\, D t and Pt- 

The behaviour of the algorithm in the second setting is depicted in Figure 3. 
We observe the increase of the dual value and the decrease of the primal value, 
the latter value stabilizing rapidly to a value close to the one of the aggregation 
method. Even though we are aware that only 10 iterations is generally much too 
less for this kind of primal-dual algorithm, it seems like the primal cost does not 
evolve significantly after 10 iterations. 

In order to compare the three settings, we simulate the corresponding strate- 
gies 6 on a large set of i.i.d. noise scenarios and compute both the mean cost and 
confidence interval for each strategy. The results are presented in Table 1. The 
"Deviance" column gives the deviance indicator returned by the GAM procedure 



5 The thermal availability is a scalar variable computed out of the thermal cost function Ct- It 
gives insight on how tense the thermal generation mix is. 

6 As in the previous example, the thermal unit strategy is chosen so as to ensure feasibility of 
the coupling constraint (see Remark 5). 
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Figure 3. Primal and dual costs along with iterations compared 
to the aggregation method 





Mean cost 




Deviance 


First setting 


2.363 


1.3 • 10-* 


50.0% 


Second setting 


2.340 


1.3 ■ 10-* 


82.4% 


Third setting 


2.338 


1.3 ■ 10"* 


86.1% 



Table 1. Results for DADP 




Figure 4. Distribution of cost differences between settings of DADP 



for the estimation of the conditional expectation of the price with respect to the 
information variable. We observe that the DADP strategy still benefits from a good 
choice for the information variable Y t : it appears from the mean costs comparison 
that adding information within the estimator improves the quality of the estima- 
tion. The mean costs differences arc however not so easy to compare for the two 
last experiments, because the confidence interval is too large compared to the cost 
values. Thus we compute for each scenario the gap between costs obtained by two 
different strategies and draw in Figure 4 the associated probability distributions. 
It becomes clearer that adding the thermal availability in the information variable 
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Figure 5. Distribution of the production/demand gap for a given 
time step 

improves the strategy: the major part of the probability weight when comparing 
settings 2 and 3 is negative. 

As a last point, let us numerically verify that Proposition 3 holds in our example, 
for instance in the first setting. Remember that, in this case, our algorithm aims 
at satisfying the coupling constraint only in expectation. We draw in Figure 5 the 
probability distribution of the production/demand gap at several iterations. We 
observe that, along with iterations, the distribution of this gap becomes symmetric 
with respect to 0, the corresponding expectation hence being equal to zero. 

Conclusion 

We presented an original algorithm for solving a certain kind of large-scale sto- 
chastic optimal control problems. It is based on an approximate Lagrangian de- 
composition: the Lagrange multiplier, which is a stochastic process in this context, 
is projected using a conditional expectation with respect to another stochastic pro- 
cess called the information process. This information process is chosen a priori and, 
when it has a limited memory, the solving of subproblems becomes tractable. We 
give theoretical results concerning the convergence of the algorithm and show how 
it actually solves an approximate problem, whose relation with the original problem 
is driven by the choice of information variable. Finally, we show on two numerical 
examples the efficiency of the approach. 

Future works will be concerned with the application of this algorithm to more 
general problem structures, like chained subsystems or networks. 

Appendix A. Duality in convex optimization 

The results presented here come from the paper by Cohen (1980a). Let U and A 
be Hilbert spaces 7 , and U aA and A ad be subsets of U and A (respectively). Moreover, 
let us define a function L : U x A — » M. We describe here the relations that link the 
so-called primal problem: 

(14) inf sup L (u, A) , 

to its dual counterpart: 

sup inf L (u, A) . 



These results can be generalized to Banach spaces (see Ekeland and Temam, 1999), but this 
is not necessary for our purpose. 
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U is called the primal space while A is called the dual one. 

Definition 1 (Saddle point). A pair (u, A) G U ad x A ad is called a saddle point 
of L on U ad x A ad if: 



L (u, A) < L (u, A) < L (u, A) , Vu G W ad , VA G A ad . 

Let us now concentrate on the case where function L corresponds to the La- 
grangian of an optimization problem: 

L(u,X) = J(u) + (X,g(u)). 

The Uzawa algorithm is defined as follows. Take an initial value Ao G A ad . At each 
iteration n > 0, compute u n by minimizing J (u) + (X n ,g (u)), and update A„ using 
the following rule: 

A„+i = II A ad (A„ + pnQ (u n )) , 

with p n some positive value. The following theorem gives conditions for the se- 
quence (it n )n>o to converge to the optimum of Problem (14). 

Theorem 1 (Cohen, 1980a, Theorem 6.1). If: 

(1) J is convex, lower semi- continuous, Gateaux differentiable, 

(2) J is a-strongly convex, 

(3) g is linear and c-Lipschitz continuous, 

(4) L has at least a saddle point (u,X), 

(5) the step-size p of the algorithm is such that < p < 
then: 

(1) u is unique and is a solution of Problem (14), 

(2) Uzawa 's algorithm converges in the sense that : 

u n — > u in ti, 

n— ^+oo 

(3) the sequence (A„) n >o is bounded and every cluster point A in the weak topol- 
ogy is such that (u, X) is a saddle point of L. 

Given the other assumptions of the theorem, assumption (4) is satisfied as long 
as the dualized constraint satisfies a so-called "qualification" condition. In addi- 
tion, the latter is always satisfied for affinc constraints, which is the case in our 
application. 

Appendix B. A lemma about decomposition 

We here depict in more details the reasons why a Stochastic Optimal Prob- 
lem (SOC) involving N independent 8 subsystems is equivalent, under certain con- 
ditions, to N problems where each one involves only one of the subsystems. Though 
this result may seem trivial at first sight, it is not true in general: the interested 
reader will find a counter example in the paper by Cohen (1980b). 

Lemma 1. Consider the following problem: 

(T-l n n \ 

E ° l t ui, w\, z t ) + J2 k 1 (x* T ) 
t=0 i=l i=l / 

subject to dynamics constraints: 

(15b) Xi +1 = fi(Xt,Ul,Wi,Z t ), Vt = 0,...,T-l,Vi = l,...,N, 

(15c) Xq is given, Vi = 1, . . . , N, 



in a sense that is made clear in Lemma 1 
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as well as bound constraints: 

(15d) xt t <X\<x\, Vt = 0,...,T,Vi=l,...,N, 

(15c) ui<Ul<ui, Vt = 0,...,T-l,Vi = l,...,N, 

and the non-anticipativity constraint: 

(15f) J7( is At -measurable, Vt = 0, . . . ,T — 1, Vi = 1, . . . , N, 

where At is the a-algebra generated by the random variables {XVI, Z s } for i = 
1, . . . , N and s = 0, . . . , t. We assume that: 

• the XV 1 . 's and Z. are all white noise processes, 

• that XV\ is not necessarily independent from XV \ for j ^ i nor from Z t ■ 
Then, the optimal feedback solution is partially decentralized, that is, each optimal 
decision U\, that may a priori depend on the whole X t and the whole XV t and 
Z t according to (15f), indeed only depends on (X\, XV\, Z t ); the Bellman function 
Vt(Xt) is additive (Vt(X t ) = J2<=i ^ti-^-t)) anc ^ optimal solution only involves 
the marginal probability laws of the pairs (XV\, Z t ) but not the joint probability laws 
of the pairs (XV t, Z t ). 

Proof. The proof is by induction over time. The statement that V is additive is 
true at the final time T since the final cost K is additive. Assume this is true from 
T to t + 1 (backward) . The Bellman equation at t reads: 

, N N 

V t (x) = E ( min £ C\ (x* , v} , XV \ , Z t ) + ^ i (ft (x* ,u i ,XV\, Z t ) ) 

^ W i=l i=l 

in which 

• the minimization operation is done over an expression is which x, Z t and 
XV\ are fixed (hazard-decision scheme) and the arg min in u parametrically 
depends on those values (which yields the optimal feedback function) ; 

• the minimization operation is subject to the bound constraints (15e) for u l 
and (15d) for fl(x\ u\ XV\, Z t ) ; 

• the expectation concerns random variables (XV t, Z t ) whereas x is still fixed 
(Xt and (XV t, Z t ) are independent from each other, thus this expectation 
may be considered as a conditional expectation knowing that X t = x): this 
yields a function of x, namely V t (-). 

Now observe that, at the minimization stage, each u 1 is involved into a separate 
expression depending only on x l , XV\ and Z t subject also to independent con- 
straints, hence the claimed partially decentralized optimal feedback. Then, at the 
outer expectation stage, we get a sum of functions of x % and (XV\, Z t ): thus only 
the marginal probability law of each pair (XV\, Z t ) is involved in the expectation 
of the corresponding term in this sum, and the result is an additive function of the 
x l , which completes the proof by induction. □ 

Let us now comment some particular cases. 

• If Zt is absent and if XV % and XV 3 are independent whenever j ^ i, then 
the overall problem is obviously made up of -/V independent subproblcms; 
the optimal feedbacks are fully decentralized (that is U l is in closed loop 
on (X l ,XV 1 )), and the optimal controls U % and U' J are also independent 
random variables whenever j ^ i. 

• If we drop the independency assumption about XV] and XV [ , then the same 
subproblems still provide the overall problem solution with decentralized 
feedbacks, but U l and U J are no longer independent. 
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• Another "extreme" situation is when only the "shared" noise Z is present 
in all subsystems (the W l, s are supposed absent for the sake of clarity 
but now, Z may be thought as the concatenation of all the W l, s). The 
conclusions of the lemma are of course still valid, that is, the Bellman 
function is still additive and each term of this sum can be calculated in a 
separate subproblcm, yielding a feedback on (X 1 , Z). However the price to 
be payed for the presence of this shared random variable is that, first, the 
minimization operation in the Bellman function is parametrized by both 
x l and Z t , which may be costly if Z t is of large dimension, and, second, 
the outer expectation in this Bellman equation involves a multiple integral 
over that vector Z t , which may also be costly. 
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